Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network

نویسندگان

چکیده

Text coherence analysis is the most challenging task in Natural Language Processing (NLP) than other subfields of NLP, such as text generation, translation, or summarization. There are many methods them graph-based entity-based for short documents. However, long documents, existing perform low accuracy results which biggest challenge both English and Bengali. This because do not consider misspelled words a sentence cannot accurately assess coherence. In this paper, method has been proposed based on Misspelling Oblivious Word Embedding Model (MOEM) deep neural network. The MOEM model replaces all with correct captures interaction between different sentences by calculating their matches using word embedding. Then, network architecture used to train test model. study examines two types datasets, one Bengali English, analyze consistency sequence activities evaluate effectiveness language dataset, 7121 documents have where 5696 (80%) training 1425 (20%) testing. And 6000 1500 evaluation out 7500 efficiency compared techniques. Experimental show that significantly improves automatic detection 98.1% 89.67% Finally, comparisons models shown datasets.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word and Document Embeddings based on Neural Network Approaches

Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and researchers aim at designing better features for specific tasks. Recently, the rapid development of deep learning and representation learning has brought new inspi...

متن کامل

Deep Neural Network Embeddings for Text-Independent Speaker Verification

This paper investigates replacing i-vectors for text-independent speaker verification with embeddings extracted from a feedforward deep neural network. Long-term speaker characteristics are captured in the network by a temporal pooling layer that aggregates over the input speech. This enables the network to be trained to discriminate between speakers from variablelength speech segments. After t...

متن کامل

Text Segmentation based on Semantic Word Embeddings

We explore the use of semantic word embeddings [14, 16, 12] in text segmentation algorithms, including the C99 segmentation algorithm [3, 4] and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iter...

متن کامل

Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

Character-based sequence labeling framework is flexible and efficient for Chinese word segmentation (CWS). Recently, many character-based neural models have been applied to CWS. While they obtain good performance, they have two obvious weaknesses. The first is that they heavily rely on manually designed bigram feature, i.e. they are not good at capturing n-gram features automatically. The secon...

متن کامل

Lexicons on Demand: Neural Word Embeddings for Large-Scale Text Analysis

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like “bleed” and “punch” to generate the category violence). Empath draws connotations between words and phrases by learning a neural embedding across...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Advanced Computer Science and Applications

سال: 2021

ISSN: ['2158-107X', '2156-5570']

DOI: https://doi.org/10.14569/ijacsa.2021.0120124